Since replay buffer stores a large number of data set, memory efficiency is one of the most important point.
In cpprb, there are two optional functionalities named next_of
and stack_compress
, which you can turn on manually when constructing replay buffer.
next_of
and stack_compress
can be used together, but currently none of them are compatible with N-step replay buffer.
These memory compressions rely on the internal memory alignment, so that these functionalities cannot be used in situations where sequential steps are not stored sequentially (e.g. distributed reinforcement learning).
next_of
In reinforcement learning, usually a set of observations before and after a certain action are used for training, so that you save the set in your replay buffer together. Naively speaking, all observations are stored twice.
As you know, replay buffer is a ring buffer and the next value should be stored at the next index, except for the newest edge.
If you specify next_of
argument (whose type is str
or array like of str
), the “next value” of specified values are also created in the replay buffer automatically and they share the memory location.
The name of the next value adds prefix next_
to the original name (e.g. next_obs
for obs
, next_rew
for rew
, and so on).
This functionality has small penalties for manipulating sampled index and checking the cache for the newest index. (As far as I know, this penalty is not significant, and you might not notice.)
import numpy as np
from cpprb import ReplayBuffer
buffer_size = 256
rb = ReplayBuffer(buffer_size,{"obs": {"shape": (84,84)},
"act": {"shape": 3},
"rew": {},
"done": {}}, # You must not specify "next_obs" nor "next_rew".
next_of=("obs","rew"))
rb.add(obs=np.ones((84,84)),
act=np.ones(3),
next_obs=np.ones((84,84)),
rew=1,
next_rew=1,
done=0)
cpprb does not check the consistence of i-th next_foo
and (i+1)-th foo
. This is user responsibility.
Since next_foo
is automatically generated, you must not specify it in the constructor manually.
Internally, next_foo
is not stored into a ring buffer, but into its chache. (So still raising error if you don’t pass them to add
.)
When sampling the next_foo
, indices (which is numpy.ndarray
) are shifted (and wraparounded if necessary), then are checked whether they are on the newest edge of the ring buffer. If the indices are on the edge, the cached one is extracted.
stack_compress
stack_compress
is designed for compressing stacked (or sliding windowed) observation. A famous use case is Atari video game, where 4 frames of display windows are treated as a single observation and the next observation is the one slided by only 1 frame (e.g. 1,2,3,4-frames, 2,3,4,5-frames, 3,4,5,6-frames, …). For this example, a straight forward approach stores all the frames 4 times.
cpprb with stack_compress
does not store duplicated frames in stacked observation (except for the end edge of the internal ring buffer) by utilizing numpy sliding trick.
You can specify stack_compress
parameter, whose type is str
or array like of str
, at constructor.
The following sample code stores 4
-stacked frames of 16x16
data as a single observation.
import numpy as np
from cpprb import ReplayBuffer
rb = ReplayBuffer(32,{"obs":{"shape": (16,16,4)}, 'rew': {}, 'done': {}},
next_of = "obs", stack_compress = "obs")
rb.add(obs=(np.ones((16,16,4))),
next_obs=(np.ones((16,16,4))),
rew=1,
done=0)
In order to make compatible with OpenAI gym, the last dimension is considered as stack dimension (which is not fit to C array memory order).
For the sake of performance, cpprb does not check the overlapped data are truly identical, but simply overwrites with new data. Users must not specify stack_compress
for non-stacked data.
Technically speaking numpy.ndarray
(and other data type supporting buffer protocol) has properties of item data type, the number of dimensions, length of each dimension, memory step size of each dimension, and so on. Usually, no data should overlap memory address, however, stack_compress
intentionally overlaps the memory addresses in the stacked dimension.